Skip to content

Join post-HNSW block rescore for diversifying child KNN (POC)#16047

Open
iprithv wants to merge 2 commits into
apache:mainfrom
iprithv:knn-block-score
Open

Join post-HNSW block rescore for diversifying child KNN (POC)#16047
iprithv wants to merge 2 commits into
apache:mainfrom
iprithv:knn-block-score

Conversation

@iprithv
Copy link
Copy Markdown
Contributor

@iprithv iprithv commented May 10, 2026

Description

This implements a proof of concept for sibling scoring in block join diversified child vector search, discussed in #15839 — Maybe Improve join block Vector search performance by block scoring child vectors.


Benchmarks (JMH)

DiversifyingChildrenFloatKnnJoinBenchmark: 3 forks, 5 warmup / 10 measurement (30 samples/cell), -Xmx2g, dim 96, topK 64, 4096 parent blocks.

children/parent rescoreBlocks=false rescoreBlocks=true
8 0.117 ± 0.002 ms/op 0.149 ± 0.002 ms/op
32 0.237 ± 0.006 ms/op 0.326 ± 0.009 ms/op
64 0.259 ± 0.005 ms/op 0.426 ± 0.013 ms/op

Overhead grows with block width (and with topK).

iprithv added 2 commits May 11, 2026 01:49
Optional blockRescore on DiversifyingChildren float/byte KNN; shared blockRescore()
with visited accounting; tests; JMH benchmark; CHANGES (Improvements, GITHUB#15839).

Relates to apache#15839
@iprithv iprithv force-pushed the knn-block-score branch from 07df8cb to 103e627 Compare May 10, 2026 20:20
@iprithv
Copy link
Copy Markdown
Contributor Author

iprithv commented May 13, 2026

@benwtrent, would like to get your thoughts on this. thanks!

@benwtrent
Copy link
Copy Markdown
Member

This doesn't address that issue.

This is a better and more complete solution that better shows the idea: #16034

@iprithv
Copy link
Copy Markdown
Contributor Author

iprithv commented May 14, 2026

@benwtrent Thanks for the pointer to #16034, agreed it's closer to what the issue describes, since sibling scoring happens inline during HNSW traversal and can affect early termination, while this POC only rescore parents after HNSW finishes (so it can fix parent score correctness but doesn't change traversal).

A couple of clarifying questions before I close this out:

  1. Is there still value in a much simpler post-HNSW path (no core API changes, no docId→ordinal cache) as a lower-overhead option for users who want correct parent ranking but not the recall via early termination story? Or do you see that as too narrow to be worth a separate code path?

  2. Wanted to check if anything from the JMH harness here (DiversifyingChildrenFloatKnnJoinBenchmark) worth porting over?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants